Learning to Identify Regular Expressions that Describe Email Campaigns

نویسندگان

  • Paul Prasse
  • Christoph Sawade
  • Niels Landwehr
  • Tobias Scheffer
چکیده

This paper addresses the problem of inferring a regular expression from a given set of strings that resembles, as closely as possible, the regular expression that a human expert would have written to identify the language. This is motivated by our goal of automating the task of postmasters of an email service who use regular expressions to describe and blacklist email spam campaigns. Training data contains batches of messages and corresponding regular expressions that an expert postmaster feels confident to blacklist. We model this task as a learning problem with structured output spaces and an appropriate loss function, derive a decoder and the resulting optimization problem, and a report on a case study conducted with an email service.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning to identify concise regular expressions that describe email campaigns

This paper addresses the problem of inferring a regular expression from a given set of strings that resembles, as closely as possible, the regular expression that a human expert would have written to identify the language. This is motivated by our goal of automating the task of postmasters who use regular expressions to describe and blacklist email spam campaigns. Training data contains batches...

متن کامل

Prologue: A machine learning sampler

Y OU MAY NOT be aware of it, but chances are that you are already a regular user of machine learning technology. Most current e-mail clients incorporate algorithms to identify and filter out spam e-mail, also known as junk e-mail or unsolicited bulk e-mail. Early spam filters relied on hand-coded pattern matching techniques such as regular expressions, but it soon became apparent that this is h...

متن کامل

Algorithms for Learning Regular Expressions

We describe algorithms that directly infer regular expressions from positive data and characterize the regular language classes that can be learned this way.

متن کامل

Learning of regular expressions by pattern matching

We are considering the problem of restoring regular expressions from representative examples. We describe a natural learning algorithm for obtaining a \plausible" regular expression from one example. The algorithm is based on nding the longest substring which can be matched by some part of the so far obtained expression. We believe that the algorithm to a certain extent mimics humans guessing r...

متن کامل

Learning and Using Formal Language

Regular expressions are a convenient and simple notation for expressing the members of a regular language. This simplicity enables regular expressions to serve as models of more complex context-free programming languages. To examine techniques in teaching programming, we exposed first year students with no knowledge of regular expressions to two tasks. The tasks asked participants to identify o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1206.4637  شماره 

صفحات  -

تاریخ انتشار 2012